计算机与现代化 ›› 2010, Vol. 1 ›› Issue (11): 9-11,1.doi: 10.3969/j.issn.1006-2475.2010.11.003

• 算法设计与分析 • 上一篇    下一篇

基于语义特征的自动文本分类方法

胡晓辉,徐也可,刘斌   

  1. 江西机电职业技术学院信息与管理工程系, 江西 南昌 330013
  • 收稿日期:2010-06-01 修回日期:1900-01-01 出版日期:2010-11-25 发布日期:2010-11-25

Semantic-based Automatic Text Classification Method

HU Xiao-hui, XU Ye-ke, LIU Bin   

  1. Department of Information & Management Engineering, Jiangxi Vocational College of Mechanical & Electrical Technology, Nanchang 330013, China
  • Received:2010-06-01 Revised:1900-01-01 Online:2010-11-25 Published:2010-11-25

摘要: 自动文本分类是指在给定的分类体系下,让计算机根据文本的内容确定与它相关联的类别。现有的文本分类算法大都基于向量空间模型,因而不能充分表达文档的语义特征信息,从而影响了分类器性能。针对此问题,本文通过训练文档构造相似矩阵,从中获得每个类别的主题信息,由此构造分类器,最后与经典的分类器进行组合以确定文本类别。实验系统证明本文提出的分类方法较大改进了分类器性能。

关键词: 文本分类, 语义特征, 向量空间模型, 图形模型, 算法

Abstract: Automatic text classification is defined as the task to assign pre-defined category labels to documents.Based on the limitations of Vector Space Model, the Vector Space Model is incapable of expressing the structure of documents effectively.To solve this problem,this paper constructs the sireilar matrix by train text, and achieves the subject information of each category through similar matrix, and then to construct the classifier by the subject information.Finally the classifier is combined with the classic classifier to determine the category of text.The experiment system
shows the effectiveness of the method.

Key words: text classification, semantic features, VSM, graphical model, algorithm

中图分类号: